Efficient Genome Wide Tagging by Reduction to SAT

نویسندگان

  • Arthur Choi
  • Noah Zaitlen
  • Buhm Han
  • Knot Pipatsrisawat
  • Adnan Darwiche
  • Eleazar Eskin
چکیده

Whole genome association has recently demonstrated some remarkable successes in identifying loci involved in disease. Designing these studies involves selecting a subset of known single nucleotide polymorphisms (SNPs) or tag SNPs to be genotyped. The problem of choosing tag SNPs is an active area of research and is usually formulated such that the goal is to select the fewest number of tag SNPs which “cover” the remaining SNPs where “cover” is defined by some statistical criterion. Since the standard formulation of the tag SNP selection problem is NP-hard, most algorithms for selecting tag SNPs are either heuristics which do not guarantee selection of the minimal set of tag SNPs or are exhaustive algorithms which are computationally impractical. In this paper, we present a set of methods which guarantee discovering the minimal set of tag SNPs, yet in practice are much faster than traditional exhaustive algorithms. We demonstrate that our methods can be applied to discover minimal tag sets for the entire human genome. Our method converts the instance of the tag SNP selection problem to an instance of the satisfiability problem, encoding the instance into conjunctive normal form (CNF). We take advantage of the local structure inherent in human variation, as well as progress in knowledge compilation, and convert our CNF encoding into a representation known as DNNF, from which solutions to our original problem can be easily enumerated. We demonstrate our methods by constructing the optimal tag set for the whole genome and show that we significantly outperform previous exhaustive search-based methods. We also present optimal solutions for the problem of selecting multi-marker tags in which some SNPs are “covered” by a pair of tag SNPs. Multi-marker tags can significantly decrease the number of tags we need to select, however discovering the minimal number of Corresponding author. K.A. Crandall and J. Lagergren (Eds.): WABI 2008, LNBI 5251, pp. 135–147, 2008. c © Springer-Verlag Berlin Heidelberg 2008

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient algorithms for genome-wide tagSNP selection across populations via the linkage disequilibrium criterion.

In this paper, we study the tagSNP selection problem on multiple populations using the pairwise r(2) linkage disequilibrium criterion. We propose a novel combinatorial optimization model for the tagSNP selection problem, called the minimum common tagSNP selection (MCTS) problem, and present efficient solutions for MCTS. Our approach consists of three main steps including (i) partitioning the SN...

متن کامل

Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies.

It is widely hoped that the study of sequence variation in the human genome will provide a means of elucidating the genetic component of complex diseases and variable drug responses. A major stumbling block to the successful design and execution of genome-wide disease association studies using single-nucleotide polymorphisms (SNPs) and linkage disequilibrium is the enormous number of SNPs in th...

متن کامل

Efficient Genome-Wide TagSNP Selection Across Populations via the Linkage Disequilibrium Criterion

In this article, we studied the tag single-nucleotide polymorphism (tagSNP) selection problem on multiple populations using the pairwise r(2) linkage disequilibrium criterion. We proposed a novel combinatorial optimization model for the tagSNP selection problem, called the minimum common tagSNP selection (MCTS) problem, and presented efficient solutions for MCTS. Our approach consists of the fo...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

Recombineering-mediated tagging of Drosophila genomic constructs for in vivo localization and acute protein inactivation

Studying gene function in the post-genome era requires methods to localize and inactivate proteins in a standardized fashion in model organisms. While genome-wide gene disruption and over-expression efforts are well on their way to vastly expand the repertoire of Drosophila tools, a complementary method to efficiently and quickly tag proteins expressed under endogenous control does not exist fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008